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Abstract In Ruckdeschel (2010a), we derive an asymptotic expansion of the max- 
imal mean squared error (MSE) of location M-estimators on suitably thinned out, 
shrinking gross error neighborhoods. In this paper, we compile several consequences 
of this result: With the same techniques as used for the MSE, we determine higher 
order expressions for the risk based on over-/undershooting probabilities as in Huber 
( 1968 1 and Rieder ( 1980), respectively. For the MSE problem, we tackle the problem 



of second order robust optimality : In the symmetric case, we find the second order op- 
timal scores again of Hampel form, but to an 0(n~ '^-smaller clipping height c than 
in first order asymptotics. This smaller c improves MSE only by 0(n~'). For the case 
of unknown contamination radius we generalize the minimax inefficiency introduced 
in Rieder et al. ( 2008} to our second order setup. Among all risk maximizing con- 
taminations we determine a "most innocent" one. This way we quantify the "limits 



of detectability"in Huber ( 1997 )'s definition for the purposes of robustness. 
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1 Motivation/introduction 

This paper takes up the central result of Ruckdeschel (2010a): a uniform higher order 
expansion of the means squared error (MSE) of location M-estimators on suitably 
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shrinking and thinned out neighborhoods Q n (r;so), repeated as Theorem 2.1 in this 
paper for easier reference. It is of the following form 

sup nMSE(5„,e„) = r 2 supW 2 +E^+-^A 1 + iA 2 + o(i) (1.1) 

Q„eQ„(ne ) 

Here S „ is an M-estimator to socres iff, and Ai, are polynomials in the contami- 
nation radius r, in b — sup \ifr\, and in the moment functions t h-» ~Ei[r t , I = 1, ... ,4 
and their derivatives evaluated in f = 0, and eo is the breakdown point of S„, i.e. 
£o = sup |^|/(sup^ - inf ijj). We recognize that the speed of the convergence to the 
first order as. value is one order faster in the ideal model. 

In this paper we present some ramifications of this theorem, but in particular 
consider its consequences for higher order robust optimality. 

Notation 1.1 For indices we start counting with 0, so that terms of first-order asymptotics have an 
index 0, second-order ones a 1 and so on. Also we abbreviate first-order, second-order and third-order by f- 
o, s-o, t-o respectively, and we write f-o-o, s-o-o, and t-o-o for first, second, and third-order asymptotically 
optimal respectively. 



In Theorem 3.1 



we take up the over- and undershooting probabilities used as 
risk in Huber ( |1968| l to determine a finite sample minimax estimator of location. 
By means of a s-o expansion, we refine the corresponding f-o translation by |Rieder| 
( 1980), providing a closer link to finite sample optimality. 

The closed form expressions in in particular under certain symmetry as- 

sumptions, allows us to tackle corresponding (uniform) higher order optimality prob- 
lems, so that we may check whether Pfanzagl (1979)'s catchword "First order ef- 
ficiency implies second order efficiency" survives when passing to neighborhoods 
around the ideal model, which — at least under symmetry — indeed (partially) holds. 

In this setting, we see that Huber-type location M-estimators remain optimal in 
second order sense, and we even may determine the s-o-o clipping height c\ - c\(r, ri) 
which in fact is slightly lower (0(n~'^ 2 )) than the f-o-o one. So in fact we only retain 
the optimal class, not the actual optimal estimator from f-o optimality. 

For situations where the radius is (partially) unknown, the concept of a minimax 
radius has been introduced and determined in Ried er et al.| ([2008): A radius ro is 
determined such that the (f-o) maximal inefficiency p(r') (as defined in \5.\\ ) is min- 
imized in r' = ro. We translate this to the s-o setup; the s-o results in the Gaussian 
location model show that neither c\(r\, ■), nor s-o minimax radius n(-) vary much in 
n and that for all n, s-o minimax inefficiency is always smaller than the corresponding 
f-o one. 

Asymptotics also helps to understand which contaminations are (already) dan- 
gerous: We determine the cniper contamination as a most innocent appearing least 
favorable contamination, which is shown to form a saddlepoint together with the f-o 
(s-o) optimal M-estimator. It appears to be innocent, as it produces only "outliers" 
which are hardest to detect in some sense specified in this section. 



Organization of the paper We start with the setup of one dimensional location and 
recall the main theorem of Ruckdeschel (2010a) in section [2] This result is general- 
ized to a over-/undershooting probability loss in section [3] 
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Consequences of Theorem |2.1| as to higher order robust optimality are discussed 
in section [4] As a (partial) explanation for the good, respectively excellent behavior 
of f-0-0, s-0-0 and t-0-0 procedures as to numerically exact finite maximal MSE, we 



present an argument based on a functional implicit function theorem in section 4.2 



For decisions upon the procedure to take, only relative risk is relevant which is dis- 
cussed in some detail in subsection 4.3 Section[5]then considers further supplemen- 
tary results to Theorem |2.1| a s-o variant of the minimax radius and s-o cniper con- 
taminations. The proofs to the theorems and propositions of this paper are collected 
in section lAl 



2 Setup 

2. 1 One-dimensional location 

We consider estimation of parameter 8 in a one-dimensional location model, i.e. 

Xi = 6 + Vi, Vi'~F, P e =£(Xd (2.1) 

for some ideal distribution F with finite Fisher-Information of location 1(F), i.e. 

A f = -/// e L 2 (F), 1(F) = E[Aj] < 00 (2.2) 

We also assume that Aj is increasing. By translation equivariance, we may restrict 
ourselves to 6q = which will be suppressed in the notation. 



The set of influence curves (IC's) ¥ for the estimation of 6 is defined as Rieder 
( fT994) l 

¥ := {<A € L 2 (F) I EM = 0, E[^A/] = 1), (2.3) 

where both expectations are evaluated under F. As class of estimators we consider 
asymptotically linear estimators (ALE's), i.e. estimators S„ — S n (X\, . . . ,X„) with 
the property 

^ S„ = ^ WZi) + °F»(n°) (2.4) 

We consider maximal mean squared error (MSE) on shrinking neighborhoods of this 
ideal model, defined as the set Q„(r) of distributions 

£t(X u . . .,X„) = Q„ = (g)[(l - ^)F + i P*,] (2.5) 
1=1 

with r„ = min(r, -\/n), r > the contamination radius and P di . e vVti(B) arbitrary, un- 
controllable contaminating distributions. As usual, we interpret Q n as the distribution 
of the vector (X,),<„ with components 

X t := (1 - UdXf + UjXf, i = 1, ... ,71 (2.6) 

forX; d , U h Xf stochastically independent, X? ~ F, U t ~ Bin(l,r/ y/n), and (Xf) ~ 
Pi for some arbitrary Pj; e Mi (B"). 
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Suppressing the dependency upon 6 as usual, in |Rieder] ([1994), the first order 
expansion of maximal MSE of an ALE is derived as 

^(S„,r) = r 2 supH 2 + E id |^| 2 (2.7) 

The (first-order) MSE-optimal IC r\b a in a smooth ^-dimensional parametric model 
with /^-derivative A by Theorem 5.5.7 (ibid.) has to be of Hampel form 

T} bo =Ymin{l,b /\Y\}, Y = A A - a (2.8) 

for some A e W Xp , oeR' such that T] bo is an IC, and bo solving E(|T| - bo)+ = r 2 bo. 
In our location context, for Lagrange multipliers z and A such that rj bo = rj Co e W, we 
get that 

r] co =A(A f -z) min{l, c /\A f -z\], (2.9) 
c s.t. E[(\A f -z\- c ) + ] = r 2 c (2.10) 



2.2 Higher Order Expansion 



In Ruckdeschel (2010a I we obtain corresponding higher order expansions of the max- 
imal MSE if we thin out the neighborhood system to the set Q„(r; so) of conditional 
distributions 

Qn = £{[(1 - Ui)X? + UiXfli I £ U, < r son - - 1 } (2.1 1) 



where e = 1/(2 + 6 ) is the functional ( |Huber| $TWl\ (2.39),(2.40))) and the finite 



sample (e-contamination) breakdown point ( Donoho and Huber| ( |1983| section 2.2)) 
of the corresponding M-estimator and do is defined by 

b:=MiI/, fe = sup^, b:=Ub-b), 6 := ■ 'MU > (2.12) 

L T z min((— b),b) 

For the result we use the following assumptions and notation: To scores function 
ft : R — » R let if/,(x) := if/(x - 1) and define the following functions L{t) := E i[> r , xtf := 
ilf,-L(t), V(t) 2 := Var <A„p(r) := E(^) 3 /V(0 3 , *(0 := E[(^) 4 ]/V(f) 4 -3.Let % mdy n 
sequences in R such that for some y > 1, iff(y„) = inf ip + o{-^), if/(y n ) = sup^ + o(^). 
For H e Ati(B") and an ordered set of indices / = (1 < i\ < . . . < < n) denote Hj 
the marginal of H with respect to /. Consider three sequences c„, d n , and k„ in R, in 
(0, oo), and in {1, . . . ,«}, respectively. We say that the sequence (7/ (n) ) c vVti(B") is 
K n -concentrated left [right] of c„ up to o(d„), if for each sequence of ordered sets /„ 
of cardinality i n <K n l~ H%\(~oo; c„Y") = o(d„), [ 1 - H^>((c n , oo) ! ») = o(d„) ]. For 
the theorem we make the following assumptions: 

(bmi) sup II^H = b < oo, monotone, if/ € ¥ 
(D) For some 6 e (0, 1], L, V, p, and k as defined above allow the expansions 

L(t) = / 1 f+i/ 2 f 2 +i/ 3 f 3 +0(f 3+i ), V(t) = v (l+v 1 f+iv 2 f 2 )+O(f 2+<5 ) (2.13) 
pit) = po+ Pl t+0(t l+s ), K (t) = Ko+O(t 6 ) (2.14) 
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(Pd) There are some T > and 77 > such that 

F{f) > 1 - r', for t > T, F(t) < (-ty> for t < -T (2.15) 
(C) Let ft be the characteristic function of ifr,(X M ); then 

lim limsup sup \f(s)\ < 1 (2.16) 

'o->0 j-xx, |f|<f 



With these preparations, we have the following theorem ( Ruckdeschel (2010a Thm. 3.5)) 



Theorem 2.1 In our one-dim. location model assume (bmi) to ( C) 

(a) the maximal MSE of the M- estimator S „ to scores-function i// expands to 

R„(S„,r, So ) = r 2 b 2 + v 2 + ^ A x + l -A 2 + o{n l ) (2.17) 

with 

Ai = v 2 (±(4v! + 3l 2 )b + l) + b 2 + [2b 2 ± fe/rV 2 (2.18) 
A 2 = vo 3 {(h + 2 vi )p + I Pi ) + vo 4 (3 v 2 + ^ + fe + 9 vf + 12 vj / 2 ) + 

+[i'o 2 ((3v 2 + 3v 2 + f / 2 +2/3 + 12 vi / 2 )fo 2 + 1 ±(8i'i + 6/ 2 )fo) + 

±3 / 2 6 3 + 5 b 2 ] r 2 + f(f / 2 + i / 3 )* 4 ± 3 / 2 fo 3 + 3 fo 2 ) r 4 (2.19) 

and we are in the — [+]-case depending on whether ( |2.20| > or ( |2.21| > below applies. 

(b) let := (^)" =1 P* ■ be contaminating measures for ( [2.5) . 77ie« 2„ w;?/i i 5 * 
as contaminating measures generates maximal risk in ( |2.17| > i//or &i > 1 anrf &2 > 
2 V (| + j^) with 6 from (Vb) and K\(n) — r k\r yfn~' either 

(P^) is Ki(n)— concentrated left ofy„ — b ^ki log(n)/n up to o(«~') (2.20) 



(P^) is K\(n)— concentrated right ofy n + b Jk~2 log(n)/n up to o(n ') (2.21) 

More precisely, ifsupt// < [>]— inf tff, the maximal MSE is achieved by contaminations 
according to ( |2T20| > [ §T2\\ ]. In case sup i]/ — ~ inf iff> ( |2.20| > | ( |2.21| i ] applies if 

vi > [<] - |(|(r 2 + 3)(1 + f n - + 3(1 - §)) (2.22) 

/f sup ip - ~ m f <A flnfl ' f/iere is "= " in ( |2.22| ), ( |2.20[ ) ana' ( |2.21[ ) generate the same risk 
up to order o(n ). 
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Special cases Let be any distribution in Q„ attaining maximal risk in Theorem 2. 1 
Under symmetry or more specifically if 



h = vi = po = 0, 
we obtain as maximal risk in ( |2.17) 



(2.23) 



n Ego [S 



+ o(n _1 ), 



J ] = (rV + vo 2 ) (l + ^ + 4) + ^ + + V + + 1^ 

(v 2 (3v 2 + 2/ 3 )i> 2 )r 2 + j/ 3 ^ 



(3v 2 +/ 3 ) 



(2.24) 



while under r = (with or without ( |2.23[ >), we get 



nE F „[S z n ] = v Q z + ■ 



I'D 



3 ((fe + 2vi )po + f Pi) i'o 4 (3v 2 + l 3 + t '2 + 12 '2 + 9v 2 ) 



+ 0(0 (2.25) 



respectively, again under ( |2.23| l, 



nE F „[S £ n ] = v z + 



2 f Vpi + v 4 (3 v 2 + h ) 



+ o{rf l ). 



(2.26) 



3 Other loss functions 



One easily shows that under similar condition as for Theorem 2.1 we may replace 
the squared loss function in the MSE by other loss functions € growing atmost at a 



polynomial rate. In this respect, Theorem 2.1 easily extends to uniform convergence 
of other risks on Q„, e.g. absolute error (((x) = \x\), Z^-error (6(x) = \x\ k ) for 1 < k < 
00, and certain covering probabilities, l(x) = l(a uai )(x) for some a\ < a 2 e R. 
As an illustration, we consider this last type of loss function, more specifically in the 
form in which it arises in the finite minimax estimation theory as in |Huber| ( | 1 968| > and 
in which it has been extended to an as. setup by Rieder ( 1980[ l: The risk is defined as 



R\S n ,r)= sup max{£„(S„ >9+ ^-), Q„(S n <6- ■%)} (3.1) 



Fraiman et al. ( 200 1 ) > have taken up a similar setup with conventional confidence 
interva ls to cover bias and variance simultaneously. We work in the setup of |Rieder] 
(1980 1 here and confine ourselves to the higher order terms of order rT l l 2 , but of 
course an extension to terms up to order n 1 as in Theorem 



2.1 



is feasible. Due to 

translation equivariance, it is no restriction to consider the case 9 = only. As in 



Rieder ( 1980 1, we work with a possibly asymmetric partition of the interval of given 



length 2a I y[n laid around the estimator: Using the partition 



2a = a x + a 2 = ai(S„) + a 2 (S„), 



(3.2) 



we minimize the risk according to Rieder|( fT980| formulas (2.8) and (2.1 1) in), if with 
b, b, and b from ( |2.12| ) and 

ai = a - 6, a 2 = a + 6, 6= + (3.3) 

If we now account for terms of order -4= we minimize the risk if we use the partition 

2a = a\ +a' 2 = a\(S „) + a' 2 (S„), (3.4) 

with 

a\ — a - 5 — 5' , a 2 — a + 5 + 5' , (3.5) 
5' = 6' n given in the theorem below. To this end, let 

si := (-a + rb)/v (3.6) 

Then, with <P and if c.d.f. and density of W(0, 1) and using the notation of Theo- 
rem |27TI we have 



Theorem 3.1 For the location model \2.ty of finite Fisher information \2.2\ , assume 
(bmi), (D') and (C). Then for sample size n, the minimal over-/undershooting prob- 
ability of an M-estimator S „ for scores-function tj/ in Q„ obtains eventually in n as 

R\S„) = sup max{e„(S„ < ^), Q„(S„ > -^)} = 

fl,e<2„ yn yn 

= R-(S n ,Q° n ._) = R + (S n ,Q° n . + ) (3.7) 
with Q®. _ resp. Q®. + according to ( |2.20| > resp. ( |2.21[ ) and 
R-(.S n ,Q° n ._) = 0(s 1 ) + ^ 7gt p(s 1 )x 

x[f + 2l 2 a6 - as lVl v - + f ] + o(^) (3.8) 

and 8' — 8' n according to 

*=u-£o~ ^-y + 62) - - ^ - i} + ,j f + s) (3 - 9) 

Remark 3.2 (a) if h = vi = and b = -I, we obtain t he same result as 43.8 K if we use the 



expressions b„ := Bias,, and = Var„ for bias and variance from Ruckdeschel 1 2010a Prop. 6. 4), plug 
them into the as. risk, which gives @((rb n - a)/v„), and then expand this up to o(n~ 1/z ). 

(b) The numerical values obtainable by Theorem |3 . 1 1 should be compared to those of Kohl 12005 
sections 1 1.3.3.3 and 11.4.1); admittedly the approach of Theorem |3.1| in this context gives rather poor 
(too liberal) approximations compared to those in the cited reference (see the R-file Thm3 1 . R available on 
the web-page to this article); this is plausible though, as Kohl already starts with finitely optimal procedures 
whereas our approach improves upon asymptotically optimal ones. 



4 Consequences: Higher Order Optimality and Relative Risk 

In this section, we consider the class S2 of all M-estimators according to (bmi), (D'), 
and (C) as well as (Pd); correspondingly, we define S3 with (D), (C) replacing (D'), 
(C); we always assume that the class of M-estimators 'H of ICs of Hampel-type ( |2.9| ) 
forms a subset of S2 [S3]. In particular we assume / to be log-concave. 
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4. 1 Second-order optimality 

Symmetry allows considerable simplifications; for instance, if F is symmetric, i.e. 
F(B) = F(-B) for all B e B, in ( |2.9| l always z — 0. But also, much deeper results are 
possible. Thus for the rest of this subsection, we assume ( |2.23| ). Then ( |2.24| i gives the 
s-o-maximal MSE for any M-estimator in S2', in particular 

Ai = Vg + b 2 (l + 2r 2 ) (4.1) 

Condition ( |2.23| ) clearly holds for skew symmetric ijj and symmetric F. For sym- 
metric F, however, for any IC iff, also iff := -ifr(— •) is an IC and hence so is the 
skew-symmetrized \jM :- ^(iff + iff), too. But by convexity of the MSE, iffa will be at 
least as good as \ft as to MSE, hence it is no restriction to only consider skew sym- 



metric ICs, and we fall into the application range of Ruckdeschel and Rieder (2004 
Thm. 3.1), i.e., 

Theorem 4.1 Assume that maximal as. risk of an ALE on Q„ resp. Q' n ( ■ , so) is rep- 
resentable as G(rb(iff),vo(iff)) for some convex real-valued function G(w,s), strictly 
isotone in both arguments and totally differentiable, bounded away from the mini- 
mum for w * co. Then, on Q n , respectively on Q n , the optimal IC of Hamp el-type 
( |2.9| > for some clipping height b — Ac determined by 

r v d w G(rAc, v ) = d s G(rAc, v Q )A E(\A - z\ - e)+ (4.2) 

In our case, this theorem specializes to 

Corollary 4.2 Assume a symmetric model ( |2.1| l with increasing Af and \2.2\ . Un- 
der the assumptions of this section, the s-0-0 M-estimator in class S2 has an IC of 
Hampel-type ( |2.9| l with z — and the s-0-0 clipping height c\ — c\(n) is determined 
by 

r 2 c(l+ f + l - ) = B(\A\-c) + (4.3) 
v r z + ryn ' 

Always, Co > C\{ri). Suppose that h(c) :- E(|A| - c) + is differentiable in cq with 
derivative h'(co). Then, 

ci( W ) = cd(1-4= -fV, J + ( 4 - 4 ) 

That is, (for n large enough) the f-0-0 clipping height co always is too optimistic. 

Assume s-o risk of ICs of Hampel-type ( |2.9| l is smooth enough in c in its min- 
imum c\ to allow a s-o Taylor expansion, which is an assumption on the remainder 
o(« _1 ) present in ( |2.17| i. Then, around c\, s-o risk behaves like a parabola. But, as by 
( |4.4| i, c\ - Co = 0(1/V« ), using c\ instead of cq can only improve s-o risk by order 
0(l/«). This even carries over to risks "near" s-o risk: 
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4.2 Consequences for the exact MSE 



Proposition 4.3 Let F, F„,G„ e C2W, « e N, swc/z that for some f} > f}' > 

(i) su Pr |F„ - g„i + if;, - g;i + if;' - g;'| = o(«A 

(ii) sup, |F„ - F| + \F' n - F'l + |F;' - F"| = 0(n^') 



(4.5) 



Assume f/zaf zn iq e R, F(xo) minimal, and that F"(xq) — fi > 0. 77ie« 

(a) there is some sequence (x n ) C R such that eventually in n, F n (x n ) is minimal 
and lim F;'(x„) = /j. 

(b) \x n - xo\ = 0(n"^). 

(c) f/iere is iome sequence (y n ) C R smc/i f/zaf eventually in n, G n (y n ) is minimal 
and\\m n G^(y n ) = f 2 . 

(d) |y„ - = 0(b"A 

(e) < G„(x„) - G„(y„) = 0(/i"^). 



The drawback of this proposition is that assumption j4.5\ is difficult to check if we 
have no explicit expression for G„: For given r > 0, let asMSE 7= o i i i 2(c)be the f-o, 
s-o, and t-o maximal MSE of an M-estimator in fi, and exMSE(c) the corresponding 



exact maximal MSE R n ; we would like to apply Proposition 4.3 to F = asMSE , 
F„ = asMSE J=1 2 and G„ = exMSE to conclude on the performance of f-0-0, s-o- 
o, t-0-0 procedures as to exMSE. As to ( |4.5) , part (ii) is easy to see checking the 
expressions, giving fi' = 1/2, while for part (i) Theorem |2.1| only says that sup t |F„ - 
G„| = o(n~^ 2 ) which in fact is 0(n~^^ 2+sr> ), and probably, under slightly stronger 
assumptions, 0(n _( - /+1 ^ 2 ). So presumably — in view ofTableU 



< exMSE( C/ „) - exMSE(c» ; „)) = 0(n^' ), ; = 0, 1 , 2 



(4.6) 



Remark 4.4 We even conjecture that we may apply an analogue to Proposition |4.3| for functions 
F,F n ,G n : ¥ — > R: Let us denote by ij/^'' n \ the corresponding f-o, s-o, t-o optimal IC and i^< cx; ") the 
exactly optimal IC; then, with the usual abuse of notation as to exMSE, we conjecture that 

< exMSEC^ -"') - exMSEft^"') = 0(n~ J " 1 ), j = 0, 1,2 (4.7) 



4.3 Relative risk 

An observation in the simulation study was that the relative MSE w.r.t. the MSE of the 
f-o-o procedure seemed to converge faster than the absolute terms. This is reflected 
by our formulas as follows: 



4.3.1 Contaminated situation 



Let asMSEo(c) and A;(c) be the f-o as. MSE and the corresponding s-o correction 
term for the Hampel-IC with clipping height c. Then we may write for the f-o [s-o] 
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relative risk relMSEo(c, r) [relMSEi(c, r, n)] w.r.t. the corresponding risk of the f-o-o 
procedure 

asMSEo(c) + ^Aj(c) 

relMSE j(c,r,n) := 



asMSEo(c )+ ^Ai(cq) 



= relMSE (c,r)(l + -^(/J(c) - z)(c ))) + o(?T 1/2 ) (4.8) 

\ V« / 

with 

b 2 (c) - vl{c) 

A{c) := ' (4.9) 

asMSEo(c) 

So in fact, the observed faster convergence is not reflected by higher order optimality, 
but as we will see, the difference between relMSEo(c, r) and relMSEi(c, r) are in fact 
small. 

Procedure choice will usually be based on relative risk, so it is interesting to consider 
the maximal error compared to the s-o approximation one incurs when using the 



f-o asymptotics instead. In view of subsection 4.1 we will limit ourselves to only 



considering Hampel-IC's with a clipping height c in the range 

C(c ,p):=[co/(l+p),c (l+p)], (4.10) 
for p > 0. This leads us to 

z)reFMSE(r;p) := max r (A(c) - A(c () (r))) (4.11) 

ceC(c (r),p) 

or even maximizing over the radius 

2(p) := /JreTMSE(p) := maxz)reiMSE(r;p) (4.12) 

r 

In the Gaussian case, the function r i-> zlrelMSE(r; p) is plotted for p = 0.1 in Fig- 
ure [l] and for z)(0.1), we get a value of 0.065, which for an actual sample size n has 
to be divided by y[n — an astonishingly good approximation! 
So down to very moderate sample sizes we can base our decision which clipping 
height to take to achieve "nearly" the optimal MSE on Q„ on f-o asymptotics 
only. A similar consideration is of course possible for the ideal situation. 

4.3.2 Illustration 

As an example we take F = W(0, 1) and calculate the terms C\, 

asMSEi := asMSE + ^A] (4.13) 
and relMSEi for the radii and sample sizes of the simulation study where for the 



optimization for c\ we use the function optimize in R 2.11.0 (compare R Devel- 



opment Core Team (2010)). The results are tabulated in Table[T] Correspondingly, we 



also determine the t-o terms C2, 



asMSE 2 := asMSEi +A 2 /n 



(4.14) 
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12 3 4 



Fig. 1 The mapping r h> ,4relMSE(r; p) for F = N(0, 1) and for p = 0. 1 . 



Table 1 c\(r, n), asMSEi (c\ (r, n), r, n) and relMSEi (c\ (r, n), r, n) 



r 


n = 5 


n = 10 


n = 30 


n = 50 


n = 100 


n = oo 


0.1 


Cl 

asMSEi 
relMSEi 


1.394 
1.248 
3.476% 


1.484 
1.197 
2.149% 


1.611 
1.140 

0.939% 


1.663 
1.122 
0.623% 


1.724 
1.103 
0.349% 


1.948 
1.054 
0.000% 


0.25 


Cl 

asMSEi 
relMSEi 


0.994 
1.635 
2.377% 


1.059 
1.519 
1.470% 


1.147 
1.397 
0.632% 


1.181 
1.358 
0.414% 


1.219 
1.319 
0.228% 


1.339 
1.220 
0.000% 


0.5 


Cl 

asMSEi 
relMSEi 


0.650 
2.527 
1.214% 


0.690 
2.271 
0.772% 


0.746 
2.006 
0.342% 


0.767 
1.923 
0.226% 


0.790 
1.840 
0.126% 


0.862 
1.636 
0.000% 


1.0 


Cl 

asMSEi 
relMSEi 


0.320 
5.761 
0.427% 


0.340 
4.944 
0.292% 


0.369 
4.110 
0.142% 


0.380 
3.852 
0.098% 


0.394 
3.593 
0.056% 


0.436 
2.964 
0.000% 



and in Figure [2] we plot the graphs of the five functions 

r i-» asMSEo(77 eo(r) , r), r h-> asMSEi(77 co( >), r, n), r i-> asMSE 2 (77 £ - o(r ), r, n) 
r i-» asMSEi(77 Cl(ri „), r, n), r asMSE 2 (^ e ,( r ,„), r, n) 

for F = N(0, 1) and for n - 30. In fact, the choice of the clipping height — crj(V), 
ci (r, n), C2(r, n) — does not entail any visible changes while the absolute value of f-o, 
s-o, and t-o MSE clearly differ. 

In the same situation, the three functions r i-> co(r), r i-> c\(r, n), r i-> c 2 (r, n) are 
plotted in Figure |3] while there are visible differences between co(V) and c,(r, ri), 
i — 1,2, c\(r, n) and c 2 (V, n) visually coincide. 




Fig. 2 The mapping r h> asMSE;[ , n ](T] C j(r{,n]), r[, «]) for = 0, 1,2, j = 0J, 
n = 30 and F = JV(0, 1) 




0.0 0.5 1.0 1.5 



Fig. 3 The mapping r h* c/(r[, n]) for y = 0, 1,2 and n = 30 and F = N(0, 1) 
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4.4 Comparison with the approach by Fraiman et al. (2001) 

Fraiman et al. ( 2001 ) work in a similar setup, i.e. the one-dimensional location prob- 



lem where the center distribution is Fq = N(0,cr 2 ) and an M-estimator S„ to skew 
symmetric scores if/ is searched which minimizes the maximal risk on a neighbor- 
hood about F{). Contrary to our approach, the authors work with convex contamina- 
tion neighborhoods *V = "ViF, s) to a fixed radius s. 

There has been some discussion which approach — fixed or shrinking radius — is more 
appropriate, but for fixed sample size n, of course we may translate the fixed radius s 
into our radius rj V» and then compare the approximation quality of both approaches. 
Fraiman et al. ([2001 ) propose to use risks which are constructed by means of a posi- 



tive function g : R x R + — > R + of as. bias B — B(F, if/) and as. variance v 2 = V 2 (F, iff). 
Here, B is defined as zero of B i — * (1 - s)fif/ /} dF + sb, and v 2 := VJV 2 for 
Vi = (1 - s) Jif/ 2 B dF + sb 2 and V 2 = (1 - s) Jip B dF. 

Function g is assumed lower semicontinuous and symmetric in the first argument 
as well as isotone in each argument. The risk of an M-estimator to IC if/ is taken as 
the function 

L g (ip) = sup g(B(G, if,), v(G, «A)/«) (4.15) 

cev 

A MSE-type risk then is given by g(u, v) = u 2 + v. It is not quite MSE, as it employs 
the as. terms B and v, so their results may differ from ours. The crucial point is that to 
solve their optimization problem, the authors have to assume that besides bias, also 
variance is maximized (for their optimal fy) if we contaminate with a Dirac measure 
in oo. According to this assumption, if we introduce Go := (1 - s)Fq + eI( M ), we have 
to find if/ minimizing 

l g (ifj) = g(B(G , iff), v(G , if/)/n) (4.16) 
Differently to the Hampel-type IC's the solutions to this problem are of form 

<pa,b,t{x) - a tanh(fx) + b[x - t tanh(fx)] (4.18) 



but the "MSE"-optimal solutions are numerically quite close to corresponding Hampel- 
ICs ifr H , for which the authors in turn show that always L g {ifi H ) — IgiifJu)- 
For an implementation of this optimization see the R-file FYZ . R available on the web- 
page. 

A comparison 

As a sort of benchmark for our results, we reproduce a comparison to be found in 
|Ruckdeschel and Kohl| ( |20i0| — albeit in some more detail than in the cited reference: 
For a set of values for n and r, we determine the "MSE"-optimal iff and a correspond- 
ing Hampel IC ipu which is then compared to the f-o-o and s-o-o IC derived in this 
paper. Within the class of Hampel-IC's, numerically, we also determine the t-o-o and 
the "exactly" optimal clipping-c, C2 and c ox respectively. We compare the resulting 
IC's as to their clipping-height and the corresponding (numerically exact) value of 
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R„(S n , r), denoted by MSE,,; the latter comparison is done by the terms relMSE^(c), 
calculated as 

relMSE;;(c.) = (MSE„(c.)/MSE„(c cx ) - 1) x 100% (4.19) 

The results are displayed in Table[2] Also compare the function allMSEs in the R-file 
asMSE . R available on the web-page to this article. 

For the numerical evaluation of the MSE, we use Algorithms C (more accurate, but 
slow for larger ri) and D (a little inaccurate for small n, but fast) discussed in Ruck- 
|deschel and Kohl| ( |2010| l. For n — oo, we evaluate the corresponding f-o as. MSE for 
the IC to the corresponding values of c. As a cross-check, the clipping heights c,, 
i = 0, 1, 2 are also determined for n - 10 8 . In case of c FZY , for all finite n's the error 
tolerance used in optimize in R was 10~ 4 , while for n — oo it was 10~ 12 . For c ex and 
n = 10 8 , an optimization of the (numerically) exact MSE would have been too time- 
consuming and has been skipped for this reason. Also, for n — 5, the radius r — 1.0, 
corresponding to e — 0.447, is not admitted for an optimization of ( |4. 16| l and thus no 
result is available in this case. 



5 Ramifications: Minimax radius and Cniper contamination 

5. 1 Minimax radius 



In this subsection, we refine the results of Rieder et al. (2008). In the cited paper, we 
want to give a guideline to the statistician which procedure to choose if he knows that 
there is contamination but does not know the radius exactly: To this end, we consider 
the maximal inefficiency p(r') defined as 



p (r'):= sup p(r',r), p{r',f):-- 

re(r t ,r u ) 



(5.1) 



and determine the minimax radius r ( ) as minimizer of po(r'). If one knows at least that 
the actual radius will lie in an interval [r/y, ry] we may determine r 7 j as minimizer 
of p y (r', r) = sup se(r ^ yry) p(r', s) and denote the corresponding minimax inefficiency 
by p y {r). In a second optimizing step we then determine the maximizer r y of p y (r). 
The unrestricted case is symbolically included by y — oo. In the Gaussian location 
case this gives 



y = 

ro 


co(/o) 


PoOo) 


r = 2 


C0O2) 


Piin) 


r = 3 

r3 


C0O3) 


P3(r3) 


0.621 


0.718 


18.07% 


0.575 


0.769 


8.84% 


0.549 


0.799 


4.41% 



These calculations can easily be translated to the s-o setup setting 

/?i(i/r,r,«) := r 2 sup|i/r| 2 + E(A 2 + t=Ai (5.2) 
so that in this paper we would instead determine r\{n) as minimizer of p\(/ , r, ri), 



sup p\{r',r,ri), 

re(r h r u ) 



pi(r',r,n) : = 



R\(r]c,( r ,n),r,n) 



(5.3) 
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Table 2 Optimal clipping heights and corresponding (numerically) exact MSE 



Y 


/i — J 


« — 1 n 


_ QA 


11 — JU 


. i _ i nn 

A( — 1UU 


11 — DO 


co 


1.948 


1.948 


1.948 


1.948 


1.948 


1.948 


relMbb,, (co) 


o.o/y% 


4. Do j to 


1 1/1 AO/ 

1 .34U% 


A Q1Z.01 

O.o3o% 


A A A O 01 
U.44O70 






1.394 


1.484 


1.611 


1.663 


1.724 


1.948 


reuvLoxi n \c\) 


U.OJJ 70 


U.ZU/ /o 


a rw7is, 
u.uz / /o 


a n 1 a cl 
U.U14 ?o 


A A 1 not. 




o.i C2 

relMab,, ycj) 


1.309 


1.428 


1.585 


1.644 


1.713 


1.948 


0.332% 


0.066% 


0.0087c 


0.004% 


0. 0067o 




C FZY 


1.368 


1.370 


1.610 


1.668 


1.756 


1.939 


relMSEJ; (c FZY ) 


0.658% 


0.002% 


rv /i i erf 

0.026% 


A no 1 (77 

0.021% 


A m 1 of 

0.031% 






1.167 


1.358 


1.560 


1.630 


1.704 




MCE f n \ 


l.JOO 


1 

1 .Ljy 


1 1 S 1 


1 1 9Q 


1 1 (XI 




CO 


1.339 


1.339 


1.339 


1.339 


1.339 


1.339 


reiivioE- n (co; 


A TQAC7 
O.ZoUyo 


J.OO 1 yO 


i 1 no 01 
1 AUa/c 


U.OjOVo 


A 11A07 
U. DD\J /O 




c\ 


0.994 


1.059 


1.147 


1.181 


1.219 


1.339 


relM3ii ; . ( \C\ ) 




U .41 j yo 




a nil ci 


A AAQ07 

u.uuy/o 




0.25 C ' 2 

relMob n \ ci) 


0.890 

a 1 A\ C7 

U.24 1 70 


a nnn 

u.yyu 

a i A/i o/ 
U. 1U4%> 


1.114 
U.UUy vo 


1.159 

A AA1 07 


1 .207 

A AA"3 07 


1 .339 


C FZY 


0.924 


1.020 


1.205 


1.177 


1.211 


1.338 


relMbb /; (c? FZY ) 


0.417% 


0.215% 


U.Zjjvo 


|"\ fit Off/ 

U AM 57c 


A A AO 07 
U.UU2% 






0.783 


0.921 


1.092 


1.140 


1.205 






2 225 


1 705 


1 438 


1 381 


1 no 




CO 


0.862 


0.862 


0.862 


0.862 


0.862 


0.862 


relMsb,, (co) 


2.yjo% 


2.655% 


0.792% 


A A A £Lftf 

0.446% 


0.2 187o 




Cl 


0.650 


0.690 


0.746 


0.767 


0.790 


0.862 


relMbb,, (c\) 


0.756% 


0.615% 


0.08 /% 


A at 

0.036% 


A A 1 1 ftf 

0.013% 




0.5 C2 

relMbb n {C2) 


a <>n 


a Ain 
U.02U 


n "7 1 1 
U. / Iz 


U. /44 


a inn 
U. / / / 


A QAT 
U.OOZ 


0.230% 


0.191% 


0.015% 


A AAOO/ 

0.008% 


A AA1 O? 

0.003% 




C FZY 


0.539 


0.632 


0.716 


0.749 


0.782 


0.866 


relMab,, (c FZY J 


a taac 
U.ZUUTb 


A 1/1 QC/ 


n m i 07 
U. 021 /o 


A A1 1 C7 

U.U1 17o 


A AAC07 






0.413 


0.531 


0.686 


0.728 


0.770 


~ 




a £n 
4.032 


3.039 


2. 162 


2.008 


1 .879 




co 


0.436 


0.436 


0.436 


0.436 


0.436 


0.436 


relMSE^(co) 


2.716% 


3.132% 


0.746% 


0.348% 


0.149% 




Cl 


0.320 


0.340 


0.369 


0.380 


0.394 


0.436 


relMSE^(ci) 


1.411% 


1.610% 


0.251% 


0.076% 


0.021% 




i.o C2 

relMSE^(c 2 ) 


0.255 


0.291 


0.342 


0.361 


0.382 


0.436 


0.876% 


0.999% 


0.123% 


0.027% 


0.006% 




Cfzy 




0.281 


0.344 


0.375 


0.387 


0.440 


relMSE^(c FZY ) 




0.892% 


0.132% 


0.063% 


0.012% 






0.001 


0.125 


0.286 


0.334 


0.366 




MSE„(c ex ) 


12.627 


8.445 


4.948 


4.296 


3.787 





order 



determined by 



optimal among M-estimators 



co 
ci 
C2 
Cfzy 

Cex 



f-o-o 
s-o-o 
t-o-o 



num. solution of 2.10) 
num. solution of 4.3) 
num. optimization of 2.17 
num. optimization of 4.16 



to any IC 

in & (see sectio n | 4.l| 
in H (see section 4. 1 1 
to (47T8j-type ICs 
in "H (see section 4.1 } 



where 



num. optimization of the (num.) exact MSE 
(4.3) is the s-o analogue to (2.10) , which is derived in Corollary |4.2| A more detailed description to this 
table is located on page|13| 
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Table 3: Minimax radii for second order asymptotics 





n = 5 


n = 10 


n = 30 


n = 50 


n = 100 


n = oo 






r 7 


0.390 


0.449 


0.514 


0.536 


0.559 


0.621 


r 


= 


CI Oy) 


0.776 


0.749 


0.729 


0.725 


0.722 


0.718 






PU Y (r r ) 


16.27% 


17.08% 


17.71% 


17.85% 


17.96% 


18.07% 






r y 


0.481 


0.496 


0.518 


0.524 


0.534 


0.548 


r 


= 3 


c\ (r y ) 


0.670 


0.694 


0.724 


0.739 


0.750 


0.800 






PU Y (r r ) 


6.213% 


6.773% 


7.490% 


7.751% 


8.036% 


8.836% 






r y 


0.540 


0.552 


0.564 


0.563 


0.571 


0.574 


r 


= 2 


c\ (r y ) 


0.609 


0.637 


0.675 


0.695 


0.707 


0.770 






Pi-,y(r r ) 


2.987% 


3.297% 


3.692% 


3.834% 


3.988% 


4.410% 



respectively p\- y and instead of p y . For finite n, however, we have to take into account 
that r < V" always. Doing so we get Table|3] showing that there is not much variation 
in both ci(r M , ■), P\;y{r y , ■) for varying n. 

So if r is completely unknown, it is a good choice to use the M-estimator to 
Hampel-scores for c « 0.7 — you will never have a larger inefficiency than the 
limiting 18%! Ex post this is one more argument, why the H07-estimate survived in 
in Sections 7.B.8 and 7.C.4 of the Princeton robustness study (Andrews et al. ( 1972 1). 
A table for the corresponding t-o minimax radii is available on the web-page. 



5.2 Innocent-looking risk-maximizing contaminations 



In Huber ( 1997 p. 62), the author complains ". . . the considerable confusion between 
the respective roles of diagnostics and robustness. The purpose of robustness is to 
safeguard against deviations from the assumptions, in particular against those that 
are near or below the limits of detectability." As worked out in Ruc kdeschel| ([2006 ), 
the exact critical rate for these limits may be determined in a statistical way: For 
some prescribed outlier set OUT, let po and q n = (1 - r„)po + r n be the probability 
under the ideal model, and under convex contaminations of radius r„, respectively. 
Considering the minimax test between these alternatives yields the exact critical rate 
1 / -\fn: under a faster shrinking po cannot be separated from q„ at all, while at a slower 
rate, asymptotically we can separate them without error. 

Going one step further, for some given 1 / V«-shrinking neighborhoods of radius 
r, we would also like to know how "small" an outlier may be, while it is still harmful 
enough to distort the classically optimal procedure in a way that this procedure is 
beaten by some robust one. 



5.2.1 The Cniper contaminaton 

To a fixed radius r, in the preceding sections, we have found/discussed f-o-o and s-o- 
o ICs of Hampel-form with clipping height c, = cj(r[, n]), j — 0, 1. To these ICs we 
have derived families of contaminations achieving maximal risk on Q„(r). By means 
of Theorem 2.1 b), these are induced by any contaminating measures under which 



17 



7]g(X d> ) is constantly either bj or —bj for bj = AjCj — up to an event of probability 
o(n _1 ). Out of these risk-maximizing contaminations, let us limit ourselves to those 
induced by Dirac masses at x: 

Q„(x) := [(I - fJP e + f n I {x] f" (5.4) 

Among these Q„(x), we seek the least "suspicious" looking contamination point x in 
the sense that the region OUT, := [x; oo) [or (-00; x)] carries large ideal probability. 



With this region as outlier set in Ruckdeschel (2006), values of x (or slightly above in 



absolute value) occurring more frequently than they should under the ideal situation 
are hardest to detect. 

More precisely, in the general smooth parametric setup (compare Kohl et al. 



( |2010| ), assume that the observations are univariate; let SjJ' ' and S„ be ALEs to the 



classical optimal IC fj = I l A and the asMSEo-optimal IC rjb , respectively. In this 
setup we define 

Definition 5.1 The f-o cniper point xq is defined as xo,+ ifxo i+ > -xq_ and xq- else, 
where 

x , + := inf{* > 1 asMSE (S:: o) , Q„(x)) < asMSE (S„, Q„(x))} 

1 (5.5) 
x - := sup{x < asMSE (5* ', Q n (x)) < asMSE (S„, Q„(x))} 

Remark 5.2 (a) The name cniper point is due to H. Rieder; it alludes to the fact that this "Ianus- 
type" contamination Q„(xo) pretends to be nice, but to the contrary is in fact pernicious, "sniping" off the 
classically optimal procedure. . . 

(b) The cniper concept is of course not bound to quadratic loss. In the obvious manor, the concept 
may be generalized for multivariate observations, if we define any jro of minimal absolute as cniper point. 

(c) To get rid of the dependency upon the radius r, in the examples we will use the minimax radii 
r r (n) defined in the preceding section. 

Correspondingly, in the setup of this paper and under ( |2.23[ ), let S ^ 1 ' be an M-estimator 



to the s-0-0 IC 77c, according to Corollary 4.2 



Definition 5.3 The s-o cniper point x\ is defined as X\ i+ if x\ + > — X\ _ and x\ - else, 
where 

x\ + := inf{jc > o| asMSEiOC', Q n {x)) < asMSEi(5„, Q„{x))} 

I , . (5-6) 

xi _ := sup{jc < J asMSEi(5!, c , Q n (x)) < asMSEi(5 , „, Q„(x))} 

Cniper contaminations and f/s-0-0 ICs form saddle-points under ( |5.7| i/( |2~23] i: 

Proposition 5.4 The pair (S n ° , Q n (xo)) is a saddlepoint for the class of all pairs 
(S n ,Qn)if 

l*)(*o)l < l%(*o)] V/3: \n b (x )\<b (5.7) 



where S „ are ALE's to ICs of form ( |2.8| l and Q n e Q„ w.r.t. f-o risk R. 

Under g23) , the same holds in the one-dimensional location model for the pair 

(Sn' > Qn(xiJ) w.r.t. s-o risk in Q(r). 
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Remark 5.5 A sufficient condition for is that A(x) = -A(-.v): Then for any b > 0, a h = is 
possible and, 

b 

AT 1 = EA/l T min{l, ■ -\<EAA r = I 

\AbA\ 

So At > I~ [ in the positive semi-definit sense, and hence for b s.t. \ijb(Xj)\ < b 

\ Vb (xj)\ = \A b A(Xj)\ > \I- [ A(xj)\ = \fj(.xj)\ (5.8) 

5.2.2 Error probabilities 

For numerical evaluations, we consider the Gaussian location model and the Gaussian 
location and scale model. In both models, x^ + - -Xj-, and without loss, we use x / + . 
For the as. tests between q n = p Q and q„ > po, alluded to in the beginning of this 
section, we note that 

pa = P 9 (Xi > Xj ) = 0(-xj), q„ = p + — (1 - po) (5.9) 

-yn 



As to the (f-o) as. minimax test Ruckdeschel (2006 formula (6.1)) gives as as. risk 



2\ 



Po 



For s-o asymptotics, we instead use the finite-sample minimax test, i.e. the Neyman- 
Pearson test with equal Type-I and Type-II error. In our case this is a corresponding 
randomized binomial test. 

5.2.3 Gaussian location 

In the G aussian location model, we draw all necessary expressions from Ruckdeschel 
(2010al Prop. ); in particular, with c\ = c\(n, r y ), and A! = (20(cj)- 1) , b\ = C\A\, 



by Theorem 2.1 b), maximizing risk amounts to either X* > c\ always or < 
—c\ always. The classically optimal estimator is the arithmetic mean, and one easily 
calculates 

Eq^Ak = k] = \[k 2 x 2 + (n- k)] (5.11) 
I n A 

and integrating out K we get directly 

n E aM [x 2 „] = l-f n + x\r 2 + f n ~£) (5.12) 
Combining this with formulas ( |2.17[ ) and ( |4.1| i, for M :- asMSE( ) (5!; i) ) we get 

M () -1 + 4(M +Z>2(r 2 + 1)+1) 

= — — n ; (5- 13 ) 



r 2 (l - l -)+ J r 



VMp^T | 1 r M +l+fr 2 (r 2 + l) VMp^T 1 . 
X\(ri)= + — - — ] + °(^) ( 5 - 14 ) 
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Table 4: Minimax contamination at y — 



n 


5 


10 


30 


50 


100 


200 


300 


CO 


r y (n) 


0.390 


0.449 


0.514 


0.536 


0.559 


0.576 


0.584 


0.621 


c\{r r ,n) 


0.776 


0.749 


0.729 


0.725 


0.722 


0.720 


0.719 


0.718 


xi(n) 


2.931 


2.470 


2.101 


2.004 


1.914 


1.853 


1.826 


1.714 


1- A, (0.05) 


0.364 


0.272 


0.215 


0.183 


0.162 


0.133 


0.132 


0.101 


s„ 


0.277 


0.178 


0.129 


0.115 


0.097 


0.089 


0.086 


0.072 



This yields the results as in Table|4] We include the type-II error 1 -/3(a) for the Ney- 
man Pearson test to niveau a — 5% and the risk s„ of the corresponding minimax test; 
roughly speaking we cannot do better than overlooking one of 10 contaminations at 
niveau 5% ideal observations to be falsely marked as outliers, and, equally weighting 
the two error types we cannot do better than with a false classification rate of 7% for 
each error type. 



5.2.4 Gaussian location and scale 



To give one more example, consider the one-dimensional location-scale model at cen- 
tral distribution yV(0, 1). For this model we have not yet established a s-o as. theory; 
for f-o asymptotics, however, we may use R-programs from the bundle RobASt, cf. 



Kohl (2005 Appendix D), and get r M = 0.579, 



max asMSE(n fl . , g„) = 3.123 (5.15) 

fi,e<3„(r„) 

while I g l Ag = (x, l(x 2 - 1)) T . This gives Xq = 1.844 — and hence £«, = 5.737% and 



1 — j8oo(5%) = 6.557%. Condition d577b is proved to hold in subsection A. 6 



A Proofs 



A. 1 A Hoeffding Bound 

Lemma A.l Let ' ~ F, i = 1, . . . ,n be real-valued random variables, < 1 Then for ji = E[f i] and 
< e < 1 - fi 

Proof |Hoeffding|(T963) , Thm. 1, inequality (2.1). □ 

To settle case (II) in the proof of Theorem |3.1| we need the following sharpening of Ruckdeschel p010b| 
Lem. A.2) 

Lemma A.2 Let k\{n) = 1 + d n and assume that for some 5 s (0, 1 /4), 

-» oo, d„n- [ > 2+6 ' -» forn-»cx> (A.2) 
Then r/'liminf„ d n > there is some c > such that 

Pr(Bin(n, ,-/ y/n ) > ki(n)r^h) = o(e~ cr ^) (A3) 
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and, ifd„ = o(n°), for any < So < 26, it holds that 

Pr(Bin(n, r/ V" ) > &i (n)r \fn ) = o(e- r " 6 ° ) (A.4) 

Remark A.3 Even if d n is increasing at a faster rate than n 1 ' 2 , assertion jA.3) remains true, as long as 
lim inf„ d„ > — but this is not needed here. 

Proof Let 

Xfc] (n) 
\og(x)dx (A.5) 



Then %„ > 0, as log(.v) > for x > 1 and By the second assumption in jA.2) , d„ = o( yfn ), so < 
d n rj -dn < 1 — rj V« eventually in n and Hoeffding's Lemma |A.l| is available; applying it to the case of 
n independent Bin(l,p) variables, we obtain for B„ ~ Bin(«,p„), p„ = r/ yfn and e = (k\(n) - l)r/ -dn 
(which is smaller than 1 - p„ eventually) 

Pr(B„ > k\ (n)r ->Jn ) < exp ( - k\ (n)r ->Jn log(A:i (n)) + (n — k\ (h)r ^fn ) X 

x(log(l--^)-log(l-* 1 (n)-^))) 
y« yn ' 

But for x < xi e (0, 1), log(l -Jt )-log(l = C X> r> dt< {x\ -x )l(l -x { ). Thus log(l -r/ -Jn)- 
log(l -h(n)rl Vn) < , d "''Jf ir and 

Pr(B„ > ki (n)r -Jn ) < exp ( - r Vn (M (n) log(yt 1 («)) - 4i (n) + 1)) = r V ", 

If lim inf„ <i„ > 0, by (A.5) lim inf„ K n > 0, and for any < c < lim inf„ 7C„ , (A.3) follows. If d„ = o(n°), 
we note that 

<K n = (1 + <M log(l + 4) - dn = djj2 + o(rf 2 ) (A.6) 

which for any 5' > entails 

Pr(Bin(n,r/ V") > k\{n)ryn) = o(exp(- rrf " ^" )) 

v 2 + 6' ' 

Now for rf„ = o(n°), by the first assumption in jA.2) , for < So < 26 eventually in n, jA.4) holds as 

2 + 5' 2 + 5' 



Another consequence of the exponential decay of jA3J/jA!4j is that we may neglect values of K > 
k\ (n)r yn when integrating along K. 

Corollary A.4 Let K ~ Bin(n, rj -dn ). Then, in the setup of Lemma \A.2\ for any j e N, 

^Kn {x> _ kiMr ^] = o(e- r " d ) (A.7) 
for any < d < yn if 'lim inf„ d„ > and any < d < 6q if lim„ rf„ = 0. 

Prao/ E[^I (x ^ l( „ )rV5} ] < n^Pr(Z > ^W^) o(e -™«) n 
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A.2 Proof of TheoremlXTI 

In the risk, we have to treat stochastic arguments in 0, <p; this is settled in the following lemma: 

Lemma A.5 Let F: B — > R be twice differentiable with Holder-continuous second derivative and G : R — » 
R be differentiable with Holder-continuous derivative. Then there is a sequence k\(n) = l+d„ with d„ — > 
according to \A.2) and some rf > 0, such that for all x,fi e R and with k = K/ *Jn, 



E[F(x + fik)\K < ki(n)r^n] = Fix + fir) + F"(x + fir)^- + o(;r I/2 ) 

2^ 



(A.8) 



and 



E[G(x + fik)\K < k { (n)r^fh] = G(x + fir) + 0(n 



•(l+77)/4 



(A.9) 



Proof Using the Taylor approximation of log(l + x), we get for n sufficiently large 

4/3 < d 2 J2 - d 3 J6 < % n < 4/2 (A.10) 

By \AA) of Lemma |A2| for some <5o and eventually in n we have P(K > k\{n)r-\fn) < exp(-rn 6 °), 
and by the same argument we also get that P(K < (2 - k\(n))r yfn) < exp(-rn s °). Hence, 



P(\k-r\ > rd n ) < 2exp(-rn"°) 



(A.ll) 



Thus, as F, G are bounded, the contribution of the set [\k — r\ > rd„] decays exponentially, while on the 
complement we have a uniformly bounded Taylor expansion up to order 2 respectively 1 for the integrands: 

F(x + fik) = F(x + fir) + F'(x + fir)fi(k -r) + F"(x + fir)fi 2 (k - rf/2 + o((k - r) 2+ i) 
G(x + fik) = G(x + fir) + G'(x + fir)fi(k -r) + o((k - r) 1 ^) 

Integrating these expansions out in k, we see that the first contribution to the Taylor series for F is the 
quadratic term, which is F"(x + /Jr)*j- VarJ, and the remainder is o(n -1 ' 2 ). For G, the first contribution to 
the eiTor term is the remainder, hence of form const|£ - r\ By the Holder inequality this gives a bound 
const [Varfe] t 2 = 0(;r (I+, ' )/4 ). □ 
For the proof of Theorem |3.1| we use a tableau like the one of Ruckdeschel 1 2010a p. 19), i.e., to derive 
the result, we partition the integrand according to 





K < k[ (n)r *Jn 


k\{n)r < K < son 


\t\ < k 2 b I \og(n)/n 


(I) 


(II) 


k2b l \og(n)/n < \t\ 


(in) 



with k\(ri) according to (A.2) . This time, no integration w.r.t. t is needed, so case (IV) from Ruc kdeschel| 
(2010a} may be canceled, which is why we may dispense of assumption (Pd) and pass to the unrestricted 
neighborhoods Q n . Cases (II) and (III) may be taken over unchanged from Ruckdeschel ( 20 1 0a Proof of 
Thm. 3.5), so we may confine us to case (I): 

We use or], 0:2 from (3.2) and proceed paralleling the proof in Ruckdeschel 1 2010a I and get from for- 
mula (A. 1 8) therein that Pr(5 „ < - % \ D k j ) = G„( - ) + 0(« J// ). So we have to spell out s n>k ( ^S- ), 
which gives 

s »-*(^t ) = v '{(-^ - ai) + - amifi + «i) - |<*?]} + o(i ) (A.12) 



and hence — setting J = Snjc(~j^ ) ar| d *l = — + t )/vo as in 



Ruckdeschel 



2010a 



Pr(S„<-^\D kJ ) = 0(s)- V (s)^p( v - 



$(h)+ f^ o [a 1 k-l 2 a 2 -2(a l + fWl -v f (sf- 1)] + o(^= XA.13) 
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This term is maximized eventually in n, if — r is maximal or, ess entia lly equivalent, all contaminating 
mass (up to mass o(n~'' 2 )) is concentrated left of y„ from Section 2.2 and then ft = k^b, and after the 



substitution according to k := k/ *Jn, Ic := k/ Vn, this gives with = — (ori + kb)/vQ 
Pr(S„ < -^|D^)= ®(h)+^lmk-l 2 c4-2s kV ohai-vofC4 ~ l)-Pb] + o(± ) (A.14) 



Now, by 43-6) , it holds that Si = —{a\ + rb)/vo, so that by an application of Lemma A. 5 for Q% _ any 
sequence of measures according to (2.20} 



Q°„.JS„ < -Sfc) = 0(s l ) + o(i ) + - ^a 2 + _ « (J 2 _ 1} _ ^ -r 2 ^] 

Correspondingly, we get for any sequence of measures Q* according to (2.21) 

Ql, + (S„ > %) = + o(i ) + >(*l)[4<*2 + ±a 2 - won**! + ?(S? - D - r^ fl + r 2 ^] 

We next account for order -4= -terms and get, as (5' = 0(-t= ) 
y« V" 

Q° _(S„ < - J ) = e°-(5„ < ) + aV(^) + o(Jg) (A. 15) 

and analogously for Q° + (S„ > ^ ), so 5' = i(- £ - ±(a 2 + <5 2 )- vmnS- f (j? - D+ ^ + ^) 
and Ql_(S„ < ) = 2° + (S„ > J ) + o(i ), i.e., 

QP_(S^-^) = *(^) + ^)Jg[^ + 2^-« 1 ? I -I^ + ^] + o(^) (A.16) 



A.3 Proof of Corollary 4.2 



The assumptions of Theorem|4. 1 |are clearly fullfilled. Hence we may start with the verification (4.3 



G(w, s) = (w 2 + .« 2 )(1 + 4t ) + w 2 (l + -) 
-yn V" r 

a„G(w,i) = 2w[l + 4=r + ~(1 + ^)], a s G(w,i) = 2s[l + 

-yn V' 1 r V" 



(A. 17) 
(A.18) 



and hence, dividing both sides of (4.2) by 2Avo, we get the assertion. The LHS of (4.3) (with or without 
factor 1 + ) i s isotone, the RHS antitone in c. Thus if we insert the factor to correct the f-o-o clipping 

height Co to ci(n), the factor increases the LHS without affecting the RHS. This can only be compensated 
for by a decrease of cq to c\(ri). If h(c) is differentiable in co with derivative h'(co), (4.4) is an application 
of the applying the implicit function theorem: Let G(.v, c) := r 2 c (1 + s) - h(c). Then G(0, Co) = 0. Hence 
for s = (r 2 + l)/(r 2 + r^fn), up to o(«~ 1/2 ), 

/ -l/2x Gj(0,c ) / 1 r 3 + r x _ 1/2 

G c (0,c ) v V" r 2 -h'(c ) 1 



□ 
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A. 4 Proof of Proposition [43] 

We apply Rieder (1994 Theorem 1.4.7) to the derivatives; this theorem says that for rj e Ci(HL) with 
7](8q) = 0, 77' (So) for some 8q e R, there exists an open neighborhood Vo c Ci(R) such that for every 
open, connected neighborhood V c Vo of r] there is a unique, continuous map T : V — > R with 

rw = o , f(T(f)) = 0, feV (A. 19) 

even more so, T is continuously bounded differentiable on V with derivative at tangent h 

dT(f)h = -h(T(f))lf'{T(f)) (A.20) 

Hence there is an open neighborhood Vo-f of F such that for each connected open neighborhood Vp c 
Vq ; f, we get a unique, continuously bounded differentiable map T : Vf — * R with 

T(F) = x , f'(T(f)) = 0, feV F , dT(f)h = -h'(T(f))/f"(T(f)) (A.21) 

But by assumption {4.5\ from some n on, F„ and G„ will lie in Vo;F, and setting j„ = T(F n ), by (A.21) 
i^fe) = 0, and 

\Xn - Xo\ = \T(F n ) - T(F)\ < \F'„(xo)\/F"(x ) = 0(n^) 
which is (b); again by \4.5\ , 

- F"(x )\ < \F'„'(x n ) - F"(x n )\ + \F"(x„) - F"(xo)\ < sup \F%(x) - F"(x)\ + o(n°) = o(n°) 



In particular, eventually in n, F'„'(x„) > and hence x„ is a minimum of F, so (a) is shown. By (4.5) , 
sup x |F - G„\ + \F' - G' n \ + \F" - G;,'| = 0(n~^), so (c) follows just as (a). For (d) we note 

l-v„ - y„\ = \T(F„) - T(G n )\ < \G'„(x n )\/F',;(x n ) ( = } |G>„)|/(/ 2 + o(n )) = OirCh 
To show (e), we introduce d„ := y„ - x„ and write 

< G n (x n ) - G„Cv„) = G' n (y n )d n + g;,'C>'„M;/2 + o(dl) = (h + °(n°))dl/2 + o(^) = 0(«- 2 ") (A.22) 

□ 



A. 5 Proof of Proposition 5.4 



We show that under the assumptions of this proposition Xi indeed defines a "uniformly bad contamination" 
in the sense that for the fixed contamination Q n (Xj) 

asMSE (S!; Jo1 , Q„(x )) = minasMSE (5«", Q„(x )) (A.23) 
b>0 



Rieder 



1994 



resp. asMSE^S^ 1 ', Q n (xi)) = min c> o asMSKi (S';f, Q„(xi)) In case j = 0, as in the setup of 
chap. 5), we obtain 

asMSEoGC, Q n (xo)) = tr CoVufe) + r 2 \r, h (x f, asMSE (S„, Q n (x )) = tr I + r 2 \fj(x a )\ 2 (A.24) 
Now for given xq, either \r] (b) (x )\ < b or 

\rf b) (x )\ = b. In the first case, \5.1\ applies and hence 



asMSE (5';," ', Q„(x )) > asMSE (S„, Q„(x )) (A.25) 
In the latter, Q„(xo) already achieves maximal as. risk for Sjf' on Q„, and hence by minimaxity of sj,* ' 

asMSEo(S»', Q„(x )) > asMSE (5;;' ', Q„{x )) (A.26) 
For the case 7=1 one argues in an analogue way. □ 
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A.6 Proof for ( |5.7[ ) in the Gaussian location scale model 

We abbreviate the location and scale parts by indices / and s respectively. By equivariance we may limit 
ourselves to the case 8 = (0, 1) T . Due to symmetry, A = A(b) from (2.8) is diagonal for all b with elements 
A; and A, and we may write 

rj b = Y mm{\, bl\Y\), Y T = (A,x,A s (x 2 - 1 - Zj )) (A.27) 

The centering z s (b) after the clipping is necessary, as the scale part is not skew symmetric; in the pure scale 
case (with known 8/), the corresponding centering z' t = z' s (b) is antitone in b, because A, is monotone in x 2 : 
It decreases from to [$~'(3/4)] 2 - 1 = -0.545 =: z. In the combined case, we never reach this extremal 
case due to the additional location part — compare Kohl (2005 Remark 8.2.1(a)) where Zs = a sc /a — 1 = 
-0.530; in any case, z s > -1 always. Hence in particular, for xg = 1.844 and b such that IJ7' '0co)l < b it 
holds that 

\ V f\xo)\ = A s (b)\xl - 1 - Zs (b)\ > A s (b)\x 2 - 1| > r s \x\ - 1| = |^(x )l (A.28) 
and thus in particular, 

l>7 (i,) (.*o)| 2 = \lf\xo f + \rif\x )\ 2 ) = IrjfHxof+Ao-M^ > Mxo? + I?4 = W x ot (A.29) 

□ 
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